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DIFFERENTIAL  METRICS  IN  PROBABILITY 
SPACES  BASED  ON  ENTROPY  AND  DIVERGENCE 
MEASURES 

C.  Radhakrishna  Rao 


SUMMARY:  In  this  paper  are  discussed  some  general  methods  of  metrizing  prob¬ 
ability  spaces  through  the  introduction  of  a  quadratic  differential  metric 
in  the  parameter  manifold  of  a  set  of  probability  distributions.  These  methods  extend 
the  investigation  made  in  Rao  (1945)  where  the  Fisher  information  matrix  was 
used  to  construct  the  metric,  and  the  geodesic  distance  was  suggested  as  a  mea¬ 
sure  of  dissimilarity  between  probability  distributions. 

's 

The  basic  approach  in  the  present" paper  is  first  to  construct  a  divergence 
or  a  dissimilarity  measure  Between  any  two  probability  distributions,  and  use  it 
to  derive  a  differential  metric  by  considering  two  distributions  whose  characterizing 
parameters  are  close  to  each  other.  One  measure  of  divergence  considered  is  the 
Jensen  difference  based  on  an  entropy  functional  as  defined  in  Rao  (1982p). 

Another  is  the  f-divergence  measure  studied  by  Csisz^r  (1967).  The  latter 
class  leads  to  the  differential  metric  based  on  the  Fisher  information  matrix. 

The  geodesic  distances  based  on  this  metric  computed  by  various  authors  are 
listed.  '  f  'J'  '  t- 
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1.  INTRODUCTION 


In  an  early  paper  (Rao,  1945),  the  author  introduced  a  Riemannian  (quadra¬ 
tic  differential)  metric  over  the  space  of  a  parametric  family  of  probability 
distributions  and  proposed  the  geodesic  distance  induced  by  the  metric  as  a 
measure  of  dissimilarity  between  probability  distributions.  The  metric  was 
based  on  the  Fisher  information  matrix  and  it  arose  in  a  natural  way  through  the 
concepts  of  statistical  discrimination  (Rao,  1949,  1954,  1973  pp.  329-332,  1982a). 
Such  a  choice  of  the  quadratic  differential  metric,  which  we  will  refer  to  as 
the  information  metric,  has  indeed  some  attractive  properties  such  as  invar¬ 
iance  for  transformation  of  the  variables  as  well  as  the  parameters.  It  also 
seems  to  provide  an  appropriate  (informative)  geometry  on  the  probability  space 

for  studying-  large  sample  properties  of  estimators  of  parameters  in  terms  of 

v 

simple  loss  functions  as  demonstrated  by  Amari  (1982,  1983),  Cencov  (1982), 

Efron  (1975,  1982),  Eguchi  (1983,  1984)  and  others. 

The  geodesic  distances  based  on  the  information  metric  have  been  computed 
for  a  number  of  parametric  family  of  distributions  in  recent  papers  by  Atkinson 
and  Mitchell  (1981),  Burbea  (1984),  Mitchell  and  Krzanowski  (1985),  and  Oiler 
and  Cuadras  (1985). 

In  two  papers,  Burbea  and  Rao  (1982a,  1982b)  gave  some  general  methods  for 
constructing  quadratic  differential  metrics  on  probability  spaces,  of  which  the 
Fisher  information  metric  belonged  to  a  special  class.  In  view  of  the  rich 
variety  of  possible  metrics,  it  would  be  useful  to  lay  down  some  criteria  for 
the  choice  of  an  appropriate  metric  for  a  given  problem.  Amari  has  stated 
that  a  metric  should  reflect  the  stochastic  and  statistical  properties  of  the 
family  of  probability  distributions.  In  particular  he  emphasized  the  invariance 


of  Che  metric  under  transformations  of  the  variables  as  well  as  the  parameters. 


v 

Cencov  (1972)  shows  that  the  Fisher  information  metric  is  unique  under  some  con¬ 
ditions  including  invariance.  Burbea  and  Rao  (1982a)  showed  that  the  Fisher  infor¬ 
mation  metric  is  the  only  metric  associated  with  invariant  divergence  measures  of 
the  type  introduced  by  CiszSr  (1967).  However,  there  exist  other  types  of 
invariant  metrics  as  shown  in  Section  3  of  this  paper. 

The  choice  of  a  metric  naturally  depends  on  a  particular  problem  under 
investigation,  and  invariance  may  or  may  not  be  relevant.  For  instance,  consider 
the  space  of  multinomial  distributions,  A  -  {  (p^, . . .  ,p^)  :  p^  >  0,  Ep^*!},  which 

is  a  submanifold  of  the  positive  orthant,  X  =  { (x. ,...,x  ):  x  >  0}  of  the  Euclidean 

1  n  1 

space  Rn.  A  Riemannian  metric  on  X  automatically  provides  a  metric  on  the  sub- 


biometric  studies 


The  object  of  the  present  paper  is  to  provide  some  general  methods  of 
constructing  Riemannian  metrics  on  probability  spaces,  and  discuss  In  particular 
the  metric  generated  by  the  quadratic  entropy  which  is  an  ideal  measure  of 
diversity  (see  Lau,  1985  and  Rao,  1982b),  and  has  properties  similar  to  the 
information  metric,  like  invariance.  We  also  give  a  list  of  geodesic  distances 
based  on  the  information  metric  computed  by  various  authors  (Atkinson  and  Mitchell, 
1981;  Burbea,  1984;  Mitchell  and  Krzanawskl,19A5;  Oiler  and  Cuadras,  1985  and 
Rao,  1945). 

The  basic  approach  adopted  in  the  paper  is  first  to  define  a  measure  of 
divergence  or  dissimilarity  between  two  probability  measures,  and  use  it  to  derive  a. 
metric  on  M,  the  manifold  of  parameters,  by  considering  two  distributions  defined 
by  two  contiguous  points  in  M.  We  thus  provide  a  wider  basis  for  the  construction 
of  an  appropriate  geometry  or  geometries  on  the  parameter  space  for  discussion  of 
practical  problems.  'Some  divergence  measures  may  be  mere  appropriate  for  discuss¬ 
ing  properties  of  estimators  using  simple  loss  functions  while  others  may  be 
appropriate  in  the  study  of  population  dynamics  in  biology.  It  is  not  unusual 
in  practice  to  study  a  problem  under  different  models  for  observed  data  to  exa¬ 
mine  consistency  and  robustness  of  results.  The  variety  of  metrics  reported  in 
the  paper  would  be  of  some  use  in  this  direction. 

2.  JENSEN  DIFFERENCE  AND  ENTROPY  DIFFERENTIAL  METRIC 
Let  v  be  a  o-finite  additive  measure  defined  on  a  a-algebra  of  subsets  of  a 
measurable  space  X,  and  P  be  the  usual  Lebesgue  space  of  v  measurable  density 
functions , 


(2.1)  P«{ p(x):  p(x)>0,  x  e  X,  p(x)dv(x)  - 1}. 

k 

We  call  H:  P  +  R  an  entropy  (functional)  on  P  if 

(I)  H(p)  >0  when  p  Is  degenerate, 

(II)  H(p)  is  concave  on  P. 

In  such  a  case,  with  A^O,  p^O,  A+p»l,  Rao  (1982a)  defined  the  Jensen  difference 
between  p  and  q  e  P  as 

(2.2)  J(A,pj  p,q)  -  H(Xp  +  pq)  -  XH(p)  -  pH(q). 

The  function  J:  P  *P  +  R  is  non-negative  and  vanishes  If  p«q  (iff  p*q  when 
H  is  strictly  concave).  If  the  entropy  function  H  is  regarded  as  a  measure  of 
diversity  within  a  population,  then  the  Jensen  difference  J  can  be  interpreted 
as  a  measure  of  diversity  (or  dissimilarity)  between  two  populations.  For 
the  use  of  Jensen  difference  in  the  measurement,  apportionment  and  analysis  of 
diversity  between  populations,  the  reader  is  referred  to  Rao  (1982a,  1982b). 

Let  us  now  consider  a  subset  of  probability  densities  characterized  by  a 
vector  parameter  0 

P  *  (p(x,0):  p(x,9)  t  P,  0  e M,  a  manifold  in  Rn} 

and  assume  that  p(x,8)  is  a  smooth  function  admitting  derivatives  of  a  certain 
order  with  respect  to  0  and  differention  under  the  integral  sign.  For  conven¬ 
ience  of  notation,  we  write 

p(  ’  >9)  *Pe,  H(0)  -H(pQ),  H(9,4>)  -H(Ap0  +  yp^) 


(2.3) 


J  (0  ,4>)  -HC0,$)  -  AH(9)  -  uH(4>) 


where  8,4>eM.  Putting  <j>  -  0+d0  and  denoting  the  i-th  component  of  a  vector  with 
a  subscript  i-,  we  consider  the  formal  expansion  of  J(8,0+d0), 


,  nn  - z -  nnn  _  3  _  , «  ,  -  v 

«•«  it  g  -i", +  it  m 


deJdeJde,+... 

i  j  k 


it rr  *Jj(e)dei'iei  +  TTm  ciik<9)deidejdV- 


In  (2.4),  the  coefficients  of  the  first  order  differentials  vanish  since 

,2 


J(0,4>)  has  a  minimum  at  <k*0,  and  the  notation  such  as  9  J(0,<j>  *0)/9<^9(|>j  is  used 
for  replacing  <f>  by  0  after  carrying  out  the  indicated  differentiations. 


H 


From  the  definition  of  the  J  function,  it  follows  that  the  (g^.)  is  a 


non-negative  definite  matrix  and  obeys  the  tensorial  law  under  transformation 
of  parameters.  We  define  the  matrix  and  the  associated  metric 


(2.5) 


(gfj)  and  ZZ  gjjdfl^e 


as  the  H-entropy  information  matrix  and  H-entropy  differential  metric  respec¬ 
tively.  We  prove  the  following  theorem  which  provides  an  alternative  computa¬ 
tion  of  the  H-information  matrix  directly  from  a  given  entropy  H. 

Theorem  2.1 


(2.6) 


H  ... 

8ij(6)  '  - 


9  H^Pg+PP^) 

ae'jWj 


<>-e 


Proof ;  By  definition 


H 

8ij 


(9) 


a2JCe,f9) 

9d>i3d>:1 


(2.7) 


92H(8.d>»8)  _  92H(»-6) 


Since  J(9,4>)  attains  a  minimum  at  <j>»8 


Differentiating  both  sides  of  (2.8)  with  respect  to  6  we  have 


which  gives  (2.6),  and  the  desired  result  is  proved. 

Let  us  consider  a  general  entropy  function  of  the  type 


(2.10) 


f 

H(P0)  *  “  h(pQ)dv(x) 


where  h"  is  a  non-negative  function.  Then  using  (2.6) 


(2.11) 


■  8Ve) 


^3* 


3  h(Xpft+up.) 


f  d  hup0+ 

J 


dv(x) 


f  3P0  3P0 


If  [-h(x)]  «  -xlogx,  leading  to  Shannon's  entropy,  then 


(2.12) 


h  f  l  8pe  3ps 

sll  ■  j 


become  the  elements  of  Fisher's  information  matrix.  If  h(x) *  (a-1)  (x  -x) , 
a^l,  we  have  the  a-order  entropy  of  Havrda  and  Charv&t  and 


(2.13) 


8lj  *  |  Pa  <■'■<*> 


which  provide  the  elements  of  a-order  entropy  information  matrix,  and  the  corres 
ponding  differential  metric  given  in  Burbea  and  Rao  (1982a,  1982b). 


We  prove  Theorem  2.2  which  gives  alternative  expressions  for  the  coeffi 
cients  of  the  third  order  differentials  in  the  expansion  of  J  (9  ,6). 

Theorem  2.2 


(2  14)  c  -  -f.g-ttAgj.r‘0V  +  +  A 

1  tjk  laei3eja<j>k  8VV*k  aejaV*k 


Proof:  By  definition 


(2.15) 


H  /a. 

cijk(e) 


36,  36,  36. 
i  j  k 


36^36  j  3$>k 

From  (2.9),  writing  i  *  j  and  j  « k  we  have 


33H(6) 

30  36  39 
i  j  k 


Differentiating  with  respect  to  9 


3013ej36k  36139j36k  3W*k  3^3^  36k  ”  30^0^3^ 

which  gives  (2.14)  as  equivalent  to  (2.15).  This  proves  Theorem  2.2. 

Let  H  be  Shannon's  entropy.  Then,  an  easy  computation  gives 

i6)Cijk.xuUr‘»+<l-x)T1Jk]+[r«)+(i-u)Tijk]+[r«>Hi-u)TiJki) 

where 


(2.17)  r 


3  logp0  3  log  p^ 

39.30,  Td. 
i  j  k 


3  log  p  3  log  p  3  log  p 

)  ’  Tijk  *  E(  ae"  ie”  seT  ^ 

i  j  k 


Adopting  the  notation  of  Amari  for  o-connexion 


the  expression  (2.16)  can  be  written 


(2.18) 


'ijk 


xy[r 


(2X-1) 

ijk 


,  r(2u-l)  .  r  (2y-l) , 
jki  ‘ikj  1 


When  X 


V 


1 

2’ 


(2.18)  becomes 


(2.19) 


p  _  I  .-(0)  (0)  (0), 

Cijk  4  [fijk  rjki  rikj]' 


Remark  1.  In  the  definition  of  the  Jensen  difference  (2.2),  we  used 
apriori  probabilities  X  and  y  for  the  two  probability  distributions  p  and  q 
which  have  some  relevance  in  population  studies.  But  in  problems  of  statis¬ 
tical  inference,  a  symmetric  version  may  be  used  by  taking  X*p*y. 

Remark  2.  Throughout  the  discussion  of  this  section,  it  was  assumed  that 
the  family  of  probability  distributions  admit  densities.  This  was  done  to 
make  the  computations  simple.  The  problems  could,  however,  be  discussed  in 
greater  generality  using  distribution  functions  instead  of  densities. 


;  3.  THE  QUADRATIC  ENTROPY 

I 


The  quadratic  entropy  was  introduced  in  Rao  (1982a)  as  a  general 
of  diversity  of  a  probability  distribution  over  any  measurable  space, 
defined  as  a  function  Q:  P -*■  R+ 

r 


(3.1) 


Q(p)  - 


K(x,y)p(x)p(y)dv(x)dv(y) 


measure 
It  is 


where  K(x,y)  is  symmetric,  non-negative  and  conditionally  negative  definite. 


9 


for  any  choice  of  (x, ,  ...,x  )  and  of  (a, ,...,a  )  such  that  a  +...+a  *0,  with 
'  in  in  in 

the  further  condition  K(x,y)  =0  if  x  =  y.  As  shown  in  Rao  (1982b)  and  Lau 
(1985),  the  quadratic  entropy  is  concave  over  P  and  its  Jensen  difference  has 
nice  convexity  properties  which  makes  it  an  ideal  measure  of  diversity.  In 
view  of  its  usefulness  in  statistical  applications,  we  give  explicit  expressions 
for  the  quadratic  differential  metric  and  the  connection  coefficients  asso¬ 
ciated  with  the  quadratic  entropy,  in  the  case  of  the  parametric  family  PQ. 

From  Theorem  2.1,  the  (i,j)-th  element  of  the  Q- information  matrix  is 


(3.2) 


sV0)" 


a2Q(*Pe  +  upa  ) 
30.,  3$  . 

i  2 


4>=e 


Observing  that 

r 

Q(Ap0+upe)  *  J  K(x,y)[Xp(x,0) +up(x,<j>)][Xp(y,0) +up(y,<t>)]dv(x)dv(y)> 
we  find  the  explicit  expression  for  (3.2)  as 


(3.3) 


g^(0)  --2Au  j  K(x,y)  dv(x>3v(y) 

•-2  Au  E[K(x,y)  3  lonW  L 

Using  the  expression  (2.14),  we  find  on  carrying  out  the  necessary  computations 


where 


(3.4) 


ijk 


Cijk  *  “2Ay(rijk  + rikj  +  rjkiJ 


K(x,y)  39^>~39'>'9^  dv(x)dv(y). 

k  i  j 


It  is  of  interest  to  note  that  the  expressions  (3.3)  and  (3.4)  are  invariant  for 


transformations  of  both  the  parameters  and  variables. 


4.  METRICS  BASED  ON  DIVERGENCE  MEASURES 


Burbea  and  Rao  (1982a,  1982b),  Burbea  (1984)  and  Eguchi  (1984)  have  consid¬ 
ered  metrics  arising  out  of  a  variety  of  divergence  measures  between  probability 
distributions.  A  typical  divergence  measure  is  of  the  form 


(4.1) 


DF^P0  *p4>) 


FIp  (x, 9 ) ,p  Cx,(j> )  ] dv  (x) 


where  F  satisfies  the  following  conditions: 

3 

(i)  F(*,*)  is  a  C  -function  on  R+*R+, 

(ii)  F(x,*)  is  strictly  convex  on  R+  for  every  x  e  R+, 

(iii)  F(x,x)  *  0  for  every  xeR+, 

x  3F(x,y“x)  , 

(iv)  - - -  =  constant  for  every  xeR,. 

ay  4* 

Let  us  consider  the  expansion 


(4.2)  yp9.pe+d0>  Ei*I3ce)de1,i9J  +yr  <3t<Md9idejd\+ 


1  F 


F  F 

and  obtain  explicit  expressions  for  g  and  c .... 

ij  ij  lc 

Theorem  4.1.  Let 


F  (x,y)  ,  5F.c*i y)  F  (x  yi  =  3-F. 

lv  »yi  3x  ’  3y 

F  =  ifFCx,y)  p  ,  9.2.FCx,yl  F  _  32F(x,y) 
11  3x2  ’  12  8xBy  *  *22  0y2 


7  -  ±  ?u,y) 

222  »  3 

3y 


F  9p0  9p0 

(i)  S^6)  *  F22[P8,P9^  30^  TeJ"  dv(x) 

r  3Pn  9PQ 


f  dpfi  dps 

Fi2Ip0,pQ]  —  dv  (x)  . 


(ii)  c‘ 


9p  9p  dp 

F222lVpe]  307  7e“  TeT  dv 

i  J  k 


2 

•_  _  3  pe  3pe  .  3"pe  3pe  .  3~pe  3pe,. 

+  r22^pe,pe^9e,98.  TeT  +  90,96,  TeT  +  ae.ae,  9e71dv(x)* 

ijk  ikj  jki 


The  results  are  established  by  straight  forward  computations. 

Let  us  consider  the  directed  divergence  measure  of  Csisz&r  (1967),  which 
plays  an  important  role  in  problems  of  statistical  inference. 


(4.3) 


DCPq.P^)  =  j  P(x.e)  *<£{^f}>dv(x) 


where  f  is  a  convex  function.  In  this  case 


(4.4) 


J  /Ax  „  3  D 

«  3+l3»J .  *-6 


where  g^  are  the  elements  of  Fisherfs  information  matirx.  Thus  a  wide  class 
of  invariant  divergence  measures  provide  the  same  informative  geometry  on  the 
parameter  manifold.  Further, 

*  (a)  33D 

Cijkl  3^3^  3*k  *.e 

-  +  +  (f"'(D  +  3f"(D)Tljk 

where  and  T.  ..  are  as  defined  in  (2.17). 

ijk  ijk 

If  f  is  a  convex  function,  then 


f*(u)  -  uf(^) 


is  also  convex,  and  the  measure  (4.3)  associated  with  f+f  is 


(4.5) 


D  <vv  Tpef(i75  +  v(r)ldv(x) 

which  is  symmetric  in  0  and  (j».  However,  we  may  define  (4.5)  as  a  symmetric 
divergence  measure  without  requiring  f  to  he  a  convex  function  but  satisfying 
the  condition  that  xf(x  ^)  +f(x)  is  non-negative  on  R+.  In  such  a  case 


g^Ce)  *  2f"(i)gij(0) 

c[jk«)  -  *"a>[r‘»+r<»+r<»]  ♦  f~U)T1Jk 


5.  OTHER  DIVERGENCE  MEASURES 

In  the  last  section,  we  considered  the  f-divergence  measure  which  led  to 
the  Fisher  information  metric.  A  special  case  of  this  measure  is  the  city  block 
distance,  or  the  overlap  distance  (see  Rao,  1948,  19.82a), 

(5.1)  OqCPq.P^)  -  j|p(x,e)-pCx,*!|dvCxl 

obtained  by  choosing  f(x)  ■  1— min(x,l),  which  admits  a  direct  interpretation 
in  terms  of  errors  of  classification  in  discrimination  problems.  However,  this 
is  not  a  smooth  function  and  no  formula  of  the  type  (4.7)  is  avialable  to  de¬ 
termine  the  coefficients  of  the  differential  metric.  But  in  some  cases,  it 
may  turn  out  that 


W11*1 '  Ve"» 

is  a  smooth  function  of  0  and  $  in  which  case 

32Dq  (0  ,<fc“0) 

(5,2>  8ij  "  8*^3 * 

In  the  case  when  p(x,0)  is  a  p-variate  normal  density  with  mean  u  and  fixed 
variance  covariance  matrix  I,  the  coefficient  (5.2)  can  be  easily  computed  to  be 
proportional  to  the  (i.j)-th  element  of  I”1,  which  is  Indeed  the  (i,j)-th 


element  of  the  Fisher  Information  matrix.  It  would  be  of  interest  to  investigate 
the  nature  of  the  metric  induced  by  (5.1)  in  the  general  case. 

Let  p(x,9)  be  the  density  of  a  uniform  distribution  in  the  interval  [0,0]. 
Then  it  is  seen  that 


(5.3) 


D0(e,*)  -  2(1  -  p  if  6<( 

-  2(1  -|-)  if  9  >  < 


Although  this  is  not  a  differentiable  function,  it  is  seen  that 


.  2  de' 

ds  -  4 — : 


is  the  metric  associated  with  (5.3). 

Another  general  divergence  measure  which  has  some  practical  applications  is 


VP0*V  •  p(PQ>-*<P^)]  dv(x) 


which  is  indeed  a  smooth  function  if  is  so.  In  this  case 


^eJ-ajWeJj'^^dvCx) 


aPft  3PH 

. (0)  ®  6  ^CP9)^(P8)3e7  967  30“  dv<x) 

i  3  R 

f  2  a2pfl  3pa  32p.  8pft  02p  .  9p 

+  2  J  (p0^  (ae  a©.  ToT*  00  00.  007+  00  00 JloT  dv(x) 

1  ijk  ikj  jki 


i  5  k 

92P0  3pa  3  2p  a  3P0  ^Pq  ■  3PC 


Another  measure  of  interest  is  the  cross  entropy  introduced  in  Rao  and 


Nayak  (1985).  If  H  is  any  entropy  function,  then  the  cross  entropy  of  p,  with 
respect  to  p  was  defined  as 


(5.4)  *><p0ip4)  -  H(p  )-H(p  )-  li. 


H[p  +X(Pe-p.)]-H(p^) 


h(p)dv(x) 


as  chosen  in  (2.10).  Then  (5.4)  reduces  to 

D(pe,p^)  *  “  h(P(j>)dv(x)  ~  jh*  (p^)  (pe-p<J))dv(x)+  h(pQ)dv(x), 


h  /  9Pg  3pfl 

*ij  *  Jh"Cp0)  30^  30^dv(x) 


which  is  the  same  as  the  h-entropy  information  matrix  derived  in  (2.10),  apart 
from  a  constant.  Similarly 

ch  «r^+r^  +  r^+T 
ijk  ijk  ikj  jki  ijk 


where 


2  2 

m  .  9  lo«  Pfl  9  lo8Pfl 

rijk  "  E(p0h  (P0)30i30j  30^  } 

,  3  log  p.  3  1ogp  3  log  p 

Ti jk  “  E{ *  3p0h" (p0 *  +  2p0h'"  (p0 *  ^  30^  30^  30^  * 


6.  GEODESIC  DISTANCES 

In  Rao  (1945)  it  was  suggested  that  the  information  metric  could  be  used 
to  obtain  the  geodesic  distances  between  probability  distributions.  Given  any 
quadratic  differential  metric 


ds‘  -  IE  glj(0)d0id0j 


(6.1) 


where  the  matrix  (g^)  is  positive  definite,  the  geodesic  curve  9*9(t)  can  be 
determined  from  the  Euler-Lagrange  equations 


(6.2.) 


n  ..  nn 

£  8ik  6i  +  ^  rijkei®j"°»  k“1 . ° 


and  from  the  boundary  conditions 


In  (6.2),  the  quantity 


(6.3) 


Htj)  **  9,  9  Ct2>  -  <t>. 


1.  3  3  3  , 

2l39i  8jk  39^  hi  ~  39k  8IjJ 


and  is  known  as  the  "Christoffel  symbol  of  the  first  kind". 

•  • 

By  definition  of  the  geodesic  curve  0*9(t),  its  tangent  vector  0*9(t)  is 

2 

of  constant  length  with  respect  to  the  metric  ds  .  Thus 


(6.4) 


ll  *  constant. 


The  constant  may  be  chosen  to  be  of  value  1  when  the  curve  parameter  t  is  the 

arc  length  parameter  s,  0<s<s  ,  with  9(0)  =  9,  9  (s_)  *  4>  and  s  *g(9,<J>)  is  the 

—  o  u  U 

geodesic  distance  between  9  and  0. 

Aitkinson  and  Mitchell  (1981)  describe  two  other  methods  of  deriving  geodesic 
distances  starting  from  a  given  differential  metric.  The  distances  obtained 
by  these  authors  in  various  cases  are  given  below.  In  each  case  we  give  the 
probability  function  p(x,9)  and  the  associated  geodesic  distance  of  (9,4>)  based 
on  the  Fisher  information  metric. 

(1)  Poisson  distribution 

p(x,9)  ■  e  6  9X/x! ,  x-0,1,... 
g(9,$)  -  2|/e  -  | 


(2)  Binomial  distribution  Cn  fixed 


p(x,9)  -  (“)6x(l-9)n_x,  x  *  0,1,. . . ,n 
g(0,4>)  *  2/n|sin  -  sin 

*  2/n  cos  +  /(1-9)  Cl— <|> >  }. 


(3)  Exponential  distribution 
pCx,9)  -  0e  X0,  x  ^0 
g (9 ,4> )  *  |  log  0  -  log<j>!  . 


(4)  Gamma  distribution  (n  fixed) 
p(x,0)  -  0n[r(n)f1xn“1e-X0,  x>0 
g(9  ,<t>)  *  I  log  9  -  log  <t>  I 

(5)  Normal  distribution 


p(x,y ,Oq)  ■ 

-  N(y,a2;x),  aQ  fixed 

gCu1,y2)  - 

K  -  u2|/o0 

(6) 

Normal  distribution 

p(x,y0,o2) 

2 

-  N(uq,o  ;x) ,  yQ  fixed 

2  2 

g  <v<9  - 

Si  | log  Oj-log  a2 | 

(7) 

Normal  distribution 

2  2 

p(x,y;o  )  -  N(y,o  ;x),  y  and  o  both  variable. 
The  information  metric  in  this  case  is 

(6.5)  d,2  .  +  ifc! 

do  o 

and  the  geodesic  distance  is 


-  2/1  tanh”15 (1,2) 


(6.6) 


I 
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where  6^  the  positive  square  root  of 

(Wl“V2^2  +2^ai"°2^2 

(urv2)2  +  2(o1+°2)2 

The  explicit  form  (6. 6  )  is  given  in  Burbea  and  Rao  (1982a).  From  (6. 6  ) 

2  2 

g(u,o1;p,a2)  */2  |  log  cr^  -loga2j 

2  2 

which  agrees  with  result  (6).  However,  gCu^a  ;y2»a  )  does  not  reduce  to  result  (7) 
since  a  =  constant  is  not  a  geodesic  curve  with  respect  to  the  metric  (6. 5  ) 

(.8)  Multivariate  normal  distribution 

N  (u  ,E  ;x) ,  Z  fixed 
P 

g(u1>v2)  * 

which  is  Mahalanobis  distance. 

(9)  Multivariate  normal  distribution 
N(y,E;x),  y  fixed 

g(Z1,Z2)  *  2"1  l  (log  Xt)2 

where  0<X  <...<X  are  the  roots  of  the  determinantal  equation  |E  -XE.I  *  0. 

1—  —  p  2  1 

The  above  explicit  form  is  due  to  S.T.  Jensen  as  mentioned  in  Atkinson  and 
Mitchell  (1981). 

(10)  Negative  binomial  distribution 

p(x,9)  ■  [x!T(r)]  ^lXx+r)9X(l-9)r,  r  fixed 

g (9  »4> )  *  2/7  cosh  *  ^  ~ 

/(1-9 )  (l-$ ) 

1-/77  +  |  /7  -  /7  | 

/(1-0 )(!-♦) 


*  /  ■>  *  *  *  •*'••**  *  •  •' >  •*.  **. »!*-  v*.  **  «*.  ■ » »*«  »*. 


2/7  log 


This  computation  is  due  to  Oiler  and  Cuadras  (1985). 


(11)  Multinomial  distribution 


n  n 

p(n^,...,n^;  ■  —  ,  -  -  ^  ,  Ti  **,7rk  *  n  ^lxed» 

Let  wjC^,...,^)  and  tt2  -  (ir12,...,w  ).  Then 

,  k 

8(!1,!2)  *  2 ^  cos  (I  ^n^i2  ) 

The  above  computation  was  originally  done  by  Rao  (1945) ,  but  an  easier  method 
of  derivation  is  given  by  Atkinson  and  Mitchell  (1981). 

Recently  Burbea  (1984)  obtained  geodesic  distances  in  the  case  of  indepen 
dent  Poisson  and  Normal  distributions  which  are  given  below. 

(12)  Independent  Poisson  distributions 

n  -0,  0*i 


P  (x. » •  • .  |X  )  *  II  e 

n  j.  n  ^ 


i  r 


g(9^»...,0n;<|>^»... ,<(0  *  21^^  —  j2jl/2 

(13)  Independent  Normal  distributions 

2  2 

N(x;p.  ,0,  )  . .  .N(x  iv.n  ) 

■l  jl  nun 

2  2  2  y 

g[  (p, iaii  ) i •  •  •  *  (p  -j  i^  i ) ;  (jj1  , . . . ,  (p  _)  ] 


nl  nl  *  12*  12 


n2  n2 


_  »  2  1+*k0.,2)  1/2 
&  [  l  log 1  - * - -J1'2 


k-1  l-6k(l,2) 

where  <5^(1, 2)  is  the  positive  square  root  of 

(ukruk2)2-f2toiIr<W2  _ 

^kl'V1  +2<°kl+sk2)2 


(14)  Multivariate  elliptic  distributions 
p(x|p,I)  -  |  E  |”1/2h[  (x-u)  ’l"*1  (x-y)  ]  , 


Jk"»  Ik'  *Jk**«j>  *  «."•  C*  mm  /*  ,*■  ,*»  **■  **■  fc*1*  ■"»  •  /»  ,*•  ,  '»  ,*•  V  /•’  *t  *,*  *,*  %  '  •  '  •  '  ,  * 
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for  some  function  h,  and  I  Is  fixed 

■  ch(!:1-ii2)'r1o;i-u2) 

where  is  a  constant,  which  is  essentially  Mahalanobis  distance.  This  result 

is  due  to  Mitchell  and  Krzanowskl  (1985). 

The  use  of  the  c^^  coefficients  defined  in  (2.4)  and  (4.2)  in  the  discussion 
of  statistical  problems  will  be  considered  in  a  future  communication. 
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